Empowering Imbalanced Data in Supervised Learning: A Semi-supervised Learning Approach
نویسندگان
چکیده
We present a framework to address the imbalanced data problem using semi-supervised learning. Specifically, from a supervised problem, we create a semi-supervised problem and then use a semi-supervised learning method to identify the most relevant instances to establish a welldefined training set. We present extensive experimental results, which demonstrate that the proposed framework significantly outperforms all other sampling algorithms in 67% of the cases across three different classifiers and ranks second best for the remaining 33% of the cases.
منابع مشابه
Semi-Supervised Learning Based Prediction of Musculoskeletal Disorder Risk
This study explores a semi-supervised classification approach using random forest as a base classifier to classify the low-back disorders (LBDs) risk associated with the industrial jobs. Semi-supervised classification approach uses unlabeled data together with the small number of labelled data to create a better classifier. The results obtained by the proposed approach are compared with those o...
متن کاملSemi-Supervised Self-training Approaches for Imbalanced Splice Site Datasets
Machine Learning algorithms produce accurate classifiers when trained on large, balanced datasets. However, it is generally expensive to acquire labeled data, while unlabeled data is available in much larger amounts. A cost-effective alternative is to use Semi-Supervised Learning, which uses unlabeled data to improve supervised classifiers. Furthermore, for many practical problems, data often e...
متن کاملSemi-Supervised Learning for Imbalanced Sentiment Classification
Various semi-supervised learning methods have been proposed recently to solve the long-standing shortage problem of manually labeled data in sentiment classification. However, most existing studies assume the balance between negative and positive samples in both the labeled and unlabeled data, which may not be true in reality. In this paper, we investigate a more common case of semi-supervised ...
متن کاملAgreement/Disagreement Classification: Exploiting Unlabeled Data using Contrast Classifiers
Several semi-supervised learning methods have been proposed to leverage unlabeled data, but imbalanced class distributions in the data set can hurt the performance of most algorithms. In this paper, we adapt the new approach of contrast classifiers for semi-supervised learning. This enables us to exploit large amounts of unlabeled data with a skewed distribution. In experiments on a speech act ...
متن کاملImbalanced Learning
With the continuous expansion of data availability in many large-scale, complex, and networked systems, it becomes critical to advance raw data from fundamental research on the Big Data challenge to support decision-making processes. Although existing machine-learning and data-mining techniques have shown great success in many real-world applications, learning from imbalanced data is a relative...
متن کامل